AITopics

Country:

Asia (0.68)
North America > United States (0.68)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-11-2026, 07:05:51 GMT

Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Retrieval (DPR) and then uses them to answer questions.

large language model, machine learning, question answering, (19 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Neural Information Processing SystemsDec-25-2025, 21:01:15 GMT

Knowledge Distillation for High Dimensional Search Index

Lightweight compressed models are prevalent in Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) owing to their superiority of retrieval efficiency in large-scale datasets. However, results given by compressed methods are less accurate due to the curse of dimension and the limitations of optimization objectives (e.g., lacking interactions between queries and documents). Thus, we are encouraged to design a new learning algorithm for the compressed search index on high dimensions to improve retrieval performance. In this paper, we propose a novel KnowledgeDistillation for high dimensional search index framework (KDindex), with the aim of efficiently learning lightweight indexes by distilling knowledge from high-precision ANNS and MIPS models such as graph-based indexes. Specifically, the student is guided to keep the same ranking order of the top-k relevant results yielded by the teacher model, which acts as the additional supervision signals between queries and documents to learn the similarities between documents. Furthermore, to avoid the trivial solutions that all candidates are partitioned to the same centroid, the reconstruction loss that minimizes the compressed error, and the posting list balance strategy that equally allocates the candidates, are integrated into the learning objective. Experiment results demonstrate that KDindex outperforms existing learnable quantization-based indexes and is 40 lighter than the state-of-the-art non-exhaustive methods while achieving comparable recall quality.

high dimensional search index, knowledge distillation, name change, (3 more...)

Technology:

Information Technology > Information Management > Search (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.59)

Neural Information Processing SystemsDec-24-2025, 23:42:38 GMT

Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to utilize knowledge from external knowledge bases to answer visually-grounded questions. Retrieval-Augmented Visual Question Answering (RA-VQA), a strong framework to tackle KB-VQA, first retrieves related documents with Dense Passage Retrieval (DPR) and then uses them to answer questions. This paper proposes Fine-grained Late-interaction Multi-modal Retrieval (FLMR) which significantly improves knowledge retrieval in RA-VQA. FLMR addresses two major limitations in RA-VQA's retriever: (1) the image representations obtained via image-to-text transforms can be incomplete and inaccurate and (2) similarity scores between queries and documents are computed with one-dimensional embeddings, which can be insensitive to finer-grained similarities.FLMR overcomes these limitations by obtaining image representations that complement those from the image-to-text transform using a vision model aligned with an existing text-based retriever through a simple alignment network. FLMR also encodes images and questions using multi-dimensional embeddings to capture finer-grained similarities between queries and documents.

fine-grained late-interaction multi-modal retrieval, name change, ra-vqa, (8 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.82)
Information Technology > Artificial Intelligence > Natural Language (0.65)
Information Technology > Knowledge Management > Knowledge Engineering (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.59)

Sokli, Effrosyni, Kasela, Pranav, Peikos, Georgios, Pasi, Gabriella

Mixture of Experts Approaches in Dense Retrieval Tasks

arXiv.org Artificial IntelligenceOct-20-2025

Dense Retrieval Models (DRMs) are a prominent development in Information Retrieval (IR). A key challenge with these neural Transformer-based models is that they often struggle to generalize beyond the specific tasks and domains they were trained on. To address this challenge, prior research in IR incorporated the Mixture-of-Experts (MoE) framework within each Transformer layer of a DRM, which, though effective, substantially increased the number of additional parameters. In this paper, we propose a more efficient design, which introduces a single MoE block (SB-MoE) after the final Transformer layer. To assess the retrieval effectiveness of SB-MoE, we perform an empirical evaluation across three IR tasks. Our experiments involve two evaluation setups, aiming to assess both in-domain effectiveness and the model's zero-shot generalizability. In the first setup, we fine-tune SB-MoE with four different underlying DRMs on seven IR benchmarks and evaluate them on their respective test sets. In the second setup, we fine-tune SB-MoE on MSMARCO and perform zero-shot evaluation on thirteen BEIR datasets. Additionally, we perform further experiments to analyze the model's dependency on its hyperparameters (i.e., the number of employed and activated experts) and investigate how this variation affects SB-MoE's performance. The obtained results show that SB-MoE is particularly effective for DRMs with lightweight base models, such as TinyBERT and BERT-Small, consistently exceeding standard model fine-tuning across benchmarks. For DRMs with more parameters, such as BERT-Base and Contriever, our model requires a larger number of training samples to achieve improved retrieval performance. Our code is available online at: https://github.com/FaySokli/SB-MoE.

large language model, machine learning, natural language, (20 more...)

2510.15683

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

B.2 Hierarchical k -means for semantic identifier The pseudo code of hierarchical k -means is detailed in in Algorithm 1. Algorithm 1: Hierarchical k -means.

Goswami, Dipam, Wang, Liying, Twardowski, Bartłomiej, van de Weijer, Joost

Query Drift Compensation: Enabling Compatibility in Continual Learning of Retrieval Embedding Models

arXiv.org Artificial IntelligenceOct-7-2025

Text embedding models enable semantic search, powering several NLP applications like Retrieval Augmented Generation by efficient information retrieval (IR). However, text embedding models are commonly studied in scenarios where the training data is static, thus limiting its applications to dynamic scenarios where new training data emerges over time. IR methods generally encode a huge corpus of documents to low-dimensional embeddings and store them in a database index. During retrieval, a semantic search over the corpus is performed and the document whose embedding is most similar to the query embedding is returned. When updating an embedding model with new training data, using the already indexed corpus is suboptimal due to the non-compatibility issue, since the model which was used to obtain the embeddings of the corpus has changed. While re-indexing of old corpus documents using the updated model enables compatibility, it requires much higher computation and time. Thus, it is critical to study how the already indexed corpus can still be effectively used without the need of re-indexing. In this work, we establish a continual learning benchmark with large-scale datasets and continually train dense retrieval embedding models on query-document pairs from new datasets in each task and observe forgetting on old tasks due to significant drift of embed-dings. We employ embedding distillation on both query and document embeddings to maintain stability and propose a novel query drift compensation method during retrieval to project new model query embeddings to the old embedding space. This enables compatibility with previously indexed corpus embeddings extracted using the old model and thus reduces the forgetting. We show that the proposed method significantly improves performance without any re-indexing.

large language model, machine learning, natural language, (19 more...)

2506.00037

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceSep-23-2025

The Role of Vocabularies in Learning Sparse Representations for Ranking

Kim, Hiun, Lee, Tae Kwan, Won, Taeryun

Learned Sparse Retrieval (LSR) such as SPLADE has growing interest for effective semantic 1st stage matching while enjoying the efficiency of inverted indices. A recent work on learning SPLADE models with expanded vocabularies (ESPLADE) was proposed to represent queries and documents into a sparse space of custom vocabulary which have different levels of vocabularic granularity. Within this effort, however, there have not been many studies on the role of vocabulary in SPLADE models and their relationship to retrieval efficiency and effectiveness. To study this, we construct BERT models with 100K-sized output vocabularies, one initialized with the ESPLADE pretraining method and one initialized randomly. After fine-tune on real-world search click logs, we applied logit score-based queries and documents pruning to max size for further balancing efficiency. The experimental result in our evaluation set shows that, when pruning is applied, the two models are effective compared to the 32K-sized normal SPLADE model in the computational budget under the BM25. And the ESPLADE models are more effective than the random vocab model, while having a similar retrieval cost. The result indicates that the size and pretrained weight of output vocabularies play the role of configuring the representational specification for queries, documents, and their interactions in the retrieval engine, beyond their original meaning and purposes in NLP. These findings can provide a new room for improvement for LSR by identifying the importance of representational specification from vocabulary configuration for efficient and effective retrieval.

information retrieval, machine learning, natural language, (15 more...)

2509.16621

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Caffagni, Davide, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

Recurrence Meets Transformers for Universal Multimodal Retrieval

arXiv.org Artificial IntelligenceSep-12-2025

With the rapid advancement of multimodal retrieval and its application in LLMs and multimodal LLMs, increasingly complex retrieval tasks have emerged. Existing methods predominantly rely on task-specific fine-tuning of vision-language models and are limited to single-modality queries or documents. In this paper, we propose ReT-2, a unified retrieval model that supports multimodal queries, composed of both images and text, and searches across multimodal document collections where text and images coexist. ReT-2 leverages multi-layer representations and a recurrent Transformer architecture with LSTM-inspired gating mechanisms to dynamically integrate information across layers and modalities, capturing fine-grained visual and textual details. We evaluate ReT-2 on the challenging M2KR and M-BEIR benchmarks across different retrieval configurations. Results demonstrate that ReT-2 consistently achieves state-of-the-art performance across diverse settings, while offering faster inference and reduced memory usage compared to prior approaches. When integrated into retrieval-augmented generation pipelines, ReT-2 also improves downstream performance on Encyclopedic-VQA and InfoSeek datasets. Our source code and trained models are publicly available at: https://github.com/aimagelab/ReT-2

large language model, machine learning, natural language, (19 more...)

2509.08897

Country:

Europe (1.00)
North America > United States > Texas (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports > Motorsports > Formula One (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)